Unsupervised Clustering of Morphologically Related Chinese Words

نویسندگان

  • Chia-Ling Lee
  • Ya-Ning Chang
  • Chao-Lin Liu
  • Chia-Ying Lee
  • Jane Yung-jen Hsu
چکیده

Many linguists consider morphological awareness a major factor that affects children’s reading development. A Chinese character embedded in different compound words may carry related but different meanings. For example, “商 店(store)”, “商品(commodity)”, “商代(Shang Dynasty)”, and “商朝(Shang Dynasty)” can form two clusters: {“商店”, “商 品”} and {“商代”, “商朝”}. In this paper, we aim at unsupervised clustering of a given family of morphologically related Chinese words. Successfully differentiating these words can contribute to both computer assisted Chinese learning and natural language understanding. In Experiment 1, we employed linguistic factors at the word, syntactic, semantic, and contextual levels in aggregated computational linguistics methods to handle the clustering task. In Experiment 2, we recruited adults and children to perform the clustering task. Experimental results indicate that our computational model achieved the same level of performance as children.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantical Clustering of Morphologically Related Chinese Words

A Chinese character embedded in different compound words may carry different meanings. In this paper, we aim at semantic clustering of a given family of morphologically related Chinese words. In Experiment 1, we employed linguistic features at the word, syntactic, semantic, and contextual levels in aggregated computational linguistics methods to handle the clustering task. In Experiment 2, we r...

متن کامل

Semantic Clustering of Morphologically Related Chinese Words

A Chinese character embedded in different compound words may carry different meanings. In this paper, we aim at semantic clustering of a given family of morphologically related Chinese words. In Experiment 1, we employed linguistic features at the word, syntactic, semantic, and contextual levels in aggregated computational linguistics methods to handle the clustering task. In Experiment 2, we r...

متن کامل

Unsupervised Sense Clustering of Related Chinese Words

Chinese words which share the same character may carry related but different meanings, e.g., “花錢(spend)”, “花 費(expend)”, “花園(garden)”, “開花(bloom))”. The semantics of these words form two clusters: {“花錢(spend)”, “花費(expend)”} and {“花園(garden)”, “開花(bloom)”}. In this paper, we aim at unsupervised clustering of a given set of such related Chinese words, where the quality of clustering results is t...

متن کامل

An Unsupervised Approach to Chinese Word Sense Disambiguation Based on Hownet

The research on word sense disambiguation (WSD) has great theoretical and practical significance in many fields of natural language processing (NLP). This paper presents an unsupervised approach to Chinese word sense disambiguation based on Hownet (an electronic Chinese lexical resource). In our approach, contexts that include ambiguous words are converted into vectors by means of a second-orde...

متن کامل

Statistical Stemming for Kannada

Stemming is a process that groups morphologically related words into the same class and is widely used in information retrieval for improving recall rate. Here we study a set of statistical stemmers for Kannada, a resource-poor language with highly inflectional and agglutinative morphology. We compare stemming using simple truncation, clustering and an unsupervised morpheme segmentation algorit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014